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p^j Abstract 

l_i Isabelle/PIDE is the current Prover IDE technology for Isabelle. It 

^1^ has been developed in ML and Scala in the past 4-5 years for this par- 

■^C ticular proof assistant, but with an open mind towards other systems. 

PIDE is based on an asynchronous document model, where the prover re- 
ceives edits continuously and updates its internal state accordingly. The 
interpretation of edits and the policies for proof document processing are 
^ determined by the prover. The editor front-end merely takes care of visual 

r) rendering of formal document content. 

HH Here we report on an experiment to connect Coq to the PIDE infras- 

^; tructure of Isabelle. This requires to re- implement the core PIDE protocol 

^ layer of Isabelle/ML in OCaml. The payload for semantic processing of 

I ^1 proof document content is restricted to lexical analysis in the sense of ex- 

isting Coqide functionality. This is sufficient as proof-of-concept for PIDE 
T-H connectivity. Actual proof processing is then a matter of improving Coq 

^ towards timeless and stateless proof processing, independently of PIDE 

>0 technicalities. The implementation worked out smoothly and required 

CN minimal changes to the refined PIDE architecture of Isabelle2013. 

This experiment substantiates PIDE as general approach to prover 
interaction. It illustrates how other provers of the greater ITP family can 
^^ participate by following similar reforms of the classic TTY loop as was 

^^ done for Isabelle in the past few years. 

m 
^ 1 Motivation 

j^ Is interactive theorem proving inherently tied to the command-line? Is the Proof 

JH General wrapper for that command-line the optimum of what can be achieved? 

Can we ever go beyond it conceptually and technologically? 

The PIDE (Prover IDE) approach challenges the predominance of Proof 
General [5] and its many clones like CoqIde, Matita [T], Proofweb \^. PIDE 
is centered around general principles of document-oriented asynchronous in- 
teraction, possibly with parallel processing on the prover side [71 [H [10] . It has 
required several years to reach the first stable release of the Isabelle/jEdit Prover 
IDE in October 2011 [5]. Isabelle2013 (February 2013) includes the third stable 
release of Isabelle/jEdit, and is already the default instead of Proof General 
(even just for the pragmatic reason that it works more likely out-of-the-box). 



^ 
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Can other provers join this movement? The present paper reports on an 
experiment called CoqPIDE that exchanges the back-end of Isabelle/jEdit to 
use Cog instead of Isabelle (see https://bitbucket.org/makarius/coq-pide/src7l 



443d088a72e6/README.PIDE?at=v8.4f from January 2013). 



The common language of the PIDE prover integration is Scala [5] , but the 
prover needs to implement certain operations to support document-oriented 
interaction natively in its own language, which is OCaml for Coq. 

Scala provides immediate access to existing Java IDE frameworks. Our 
standard application uses jEdit, but Eclipse, Netbeans, IntelliJ IDEA are in 
principle possible as well. The JVM is also a good platform for advanced web 
services. 

In the past, OCaml was quite capable to integrate mainstream C libraries, 
such as GTK for GUI components, although that is now outdated. OCaml/GTK 
was used to implement Coqide within Coq itself, but GTK does not yet provide 
a full-scaled IDE, not even an able text editor. 

The Java platform in general, and jEdit in particular, are not free from 
technical problems, but Isabelle/jEdit shows how these raw industrial materials 
can be adopted for our provers. Scala is particularly helpful to make the JVM 
platform accessible to the higher-order functional culture of proof assistants. 

The main results of the CoqPIDE experiment are as follows: 

Universality. The requirements for the prover to implement the PIDE docu- 
ment model are easily met by porting existing SML implementations to 
OCaml. Note that PIDE defines only rather general principles of docu- 
ment editing, leaving most of the details to the prover. 

Clarity. The minimal PIDE protocol implementation for OCaml/Coq helps 
to explain how PIDE actually works, without the additional layers of so- 
phistication and performance tuning that have accumulated in Isabelle 
already. 

Frugality. A meaningful application of PIDE to a different prover merely re- 
quires 40 kB of sources in OCaml (26 kB) and Scala (14 kB). Approx. 50% 
is for the implementation of PIDE datatypes and protocol operations, the 
other 50% Coq-specific "payload". For more serious semantic process- 
ing on the prover side, the payload will grow beyond this initial PIDE 
configuration. 



2 PIDE Document Operations 

The PIDE document model maintains sources (produced by the editor) and 
resulting formal content (produced by the prover). Its programming inter- 
face consists of statically- typed Scala operations. Document. update{old-version, 
new-version, edits) applies source edits to turn one version non-destructively 
into another. Document. remove-versions{versions) indicates obsolete versions 
to allow garbage collection eventually. 

Document update is declarative: it specifies where to insert or remove parts of 
the source text, but its operational consequences are determined by the prover. 
Each document version is associated with an execution in ML to work out these 



formal details, and report results back to the Scala side. PIDE provides op- 
erational hints to improve performance, like Document. discontinue_execution{) 
and Document. canceLexecution{) in certain situations of its editing pipeline. 
Cancellation should cause some physical interrupt within the prover, which is 
important for long-running proof checking, but the CoqPIDE ignores this for 
now. 

Note that the PIDE model is inherently asynchronous: the front-end never 
waits for the back-end. Uninterruptible execution could mean (infinitely) long 
delay of the update of formal annotations seen in the editor buffer, but the 
Prover IDE does not block, nor lock text in the manner of Proof General. 



3 PIDE Protocol Implementation (OCaml) 

The PIDE protocol merely propagates tree-structured datatypes between Scala 
and ML, but various details need to be observed to make it robust, efficient, 
and portable. It is a bit like plumbing different kinds of metal: a leaden JVM 
with an aluminium ML system, using a copper pipe. The PIDE protocol stack 
has evolved over several years to the "proven technology" in Isabelle2013. For 
CoqPIDE, we re-implemented the ML side in OCaml, which turned out a simple 
programming exercise of a few days (including to learn some OCaml in the first 
place). The main protocol layers are as follows. 

Bidirectional byte-channel. PIDE demands a bidirectional communication 
channel based on clean byte-streams, with block-buffering and high throughput. 
On Unix this can be implemented by a pair of fifos (named pipes), which are 
opened like a regular file on each side, in the correct order for rendevouz. PIDE 
on Windows uses TCP sockets instead, but they turn out slightly less efficient 
and less robust on some ML versions. For CoqPIDE we use fifos and thus 
restrict it to Unix for now, although OCaml sockets probably work as well. 

Content sent over the byte-channel is partitioned into chunks with explicit 
length indication (encoded via ASCII digits followed by newline) . This depends 
on the assumption that the channel is private to the protocol handler. 

Note that public stdout cannot be used, because it is subject to spurious 
output by parts of the ML process (runtime system, libraries) beyond our con- 
trol. Classic Proof General j^ avoids this problem by using human-readable 
control commands and asking the user for manual repairs when the protocol 
looses synchronization. This no longer worked for the PGIP/XML protocol [3], 
so it was suffering from breakdown caused by unexpected diagnostic messages. 

Text encoding and character positions. Text on the JVM consists of 16- 
bit characters, but requires one or two such characters to represent a single 
Unicode 6 codepoint according to UTF-16. ML usually prefers some extension 
of ASCII, formerly ISO-latin, now UTF-8 where multi-character encodings are 
commonplace. In any case, logical text addressing needs to agree on both sides 
to attach error messages or other formal markup precisely to source positions. 
PIDE standardizes towards UTF-8 on the prover side and recedes text to 
UTF-16 for Scala/JVM. Physical text addressing works via byte offsets in ML, 
and character offsets on the JVM. Logical text positions are either translated 
explicitly by functions provided by the prover, or represented in a way that is 
invariant wrt. the encoding (like Isabelle symbols [8; §2.1]). For CoqPIDE we 



have re-used byte_of f set_to_char_of f set from Coqide, which works for the 
Basic Multilingual Plane where UTF-16 requires only one 16-bit character. 

YXML transfer syntax of untyped trees. PIDE uses untyped XML trees 
for document markup (and arbitrary ML/Scala values), but ignores the com- 
plications of official XML syntax and various attempts at XML type-systems. 
Instead, the markup tree structure over the text is represented by two special 
control characters that are outside the text range of XML 1.0 and what provers 
normally use. In contrast to official XML syntax, this avoids quoting of the text 
and allows cumulative markup of text that might have been marked already. 

Our XML transfer syntax is called YMXL (pronounced as "Why XML?"), 
see also |8| §2.3]. Efficient and robust YXML parsing is easily implemented 
in any programming language. What is also notable about YXML is that it 
is orthogonal to UTF-8 text encoding: the operations to decode text and to 
recover tree structure can be commuted by the PIDE infrastructure as required. 

The OCaml version of YXML is a literal translation of the SML version 



from Isabelle, see also https://bitbucket.org/makarius/yxml and appendix A.l 
Note that Coqide uses some XML Light implementation instead, which suffers 
from typical problems with boundary cases of standard XML (e.g. incorrect 
treatment of white-space) . 

XML/ML data representation. The algebraic datatypes that PIDE trans- 
fers between Scala and ML may consist of base types like boot, int, string (for 
text in the above sense), product types (tuples or records), variant types (disjoint 
sums), and recursion over the same. Imitating the canonical memory layout of 
ML values in untyped memory, we provide ML functions and combinators for 
each of these type constructions wrt. raw XML trees: 

type 'a Encode. t = 'a -> XML. tree list 

Encode . string : string Encode. t 

Encode. pair: 'a Encode. t -> 'b Encode. t -> ('a * 'b) Encode. t 

Encode. list: 'a Encode. t -> 'a list Encode. t 

Etc., also with symmetric versions for Decode. The modules XML. Encode 
and XML. Decode are available on the Scala side as well. The implementa- 
tion is mostly trivial, consisting of a few lines for each combinator. Note 
that it is important to work recursively with XML. tree list and cope with 
its "mixed content" of alternating XML.Elem and XML. Text nodes. See also 
|https : / /bitbucket . org/makarius/yxml and appendix A. 2 



Each PIDE protocol function is wrapped into a combinator expression over 
XML. Encode and XML. Decode that is isomorphic to the corresponding datatype 
definitions of its arguments. This slight redundancy is isolated in a single place 
in Scala and ML, respectively. Public interfaces on each side are statically typed. 



4 Coq-specific PIDE Modules (OCaml and 
Scala) 

After studying the Coq sources for a few days, to see what is already there to 
serve our purpose, we have chosen lexical syntax processing of Coqide: it is used 
for syntax highlighting in its GTK text widget. 
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(** * A different elimination principle for the order on natural numbers *) 

] 

Lemma le_elim_rel : 

forall P:nat -> nat -> Prop, 

(forall p, P p) ■> 

(forall p (q:nat), p <= q -> P p q -> P (S p) (S q)) -> 

forall n m, n <= m -> P n m. 
Proof. 

Induction n; auto with arlth. 

intros m Le. 

elim Le; auto with arith. 
Qed. 
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The result looks like in the given screenshot, but that is already CoqPIDE imi- 
tating the look-and-feel of Coqide! This minimal PIDE application has required 
the following addtional modules. 

Coq document structure. CoqPIDE implements the main PIDE document 
operations using a very simple document model in OCaml: an association list of 
named nodes ( . v files), each consisting on a list of entries (like command spans 
in Proof General). The Scala-side of CoqPIDE does not even exploit the full 
structure yet: it merely turns each file into one monolithic command span. This 
is sufficient to work efficiently with files of 10-100 kB size. Editing larger sources 
requires to use more sub-structure as is done routinely in Isabelle/PIDE. 

Coq markup and rendering. PIDE allows the prover to annotate source text 
by arbitrary formal content, which is represented by untyped and uninterpreted 
XML trees. The prover may define its own vocabulary of markup elements, 
together with an interpretation that is called rendering. The rendering module 
provides functions to turn XML trees that are associated with given text ranges 
into GUI elements: colors, boxes, squiggles, icons, tooltips, hyperlinks etc. 

CoqPIDE defines markup elements for the main lexical categories of Coq, 
with rendering that uses the original Coqide colors. There are two tiny ad- 
ditions: the dot as proof-script terminator is painted red, and quoted strings 
are rendered with transparency (alpha channel) which is now common-place in 
Isabelle/PIDE — it helps to combine multiple layers of prover markup system- 
atically. 

Coq theory syntax. PIDE allows the prover to define aspects of its syntax 
directly on the editor side, e.g. to tell which files are special (.v for Coq) and 
how files are loaded for the prover (sources are managed by the front-end). 
CoqPIDE only provides the bare minimum to make the system work. Extra 
efforts would be required to imitate Isabelle/PIDE, which allows to augment 
theory syntax while editing, and resolve file dependencies automatically. The 
latter is important for applications consisting of several modules that are edited 
simultaneously. Note that neither Coqide nor Proof General allow multiple 
"active" buffers. 

Of course we could have augmented jEdit directly by a small Coq lexer in 
Scala to get plain syntax highlighting. Since CoqPIDE follows the architecture 



of PIDE, with its asynchronous exchange of edits that are interpreted by the 
prover, it may serve as a starting point for actual proof processing. Some of the 
required reforms to the Coq toplcvcl have been explored already |6] . Eventually 
we shall see these movements converge to more serious PIDE/Coq integration. 
Isabelle shares the roots of TTY-based command-line interaction with Coq 
and other members of the LCF family, but managed to reform itself towards 
full-scale Prover IDE support within a few years. It should be feasible to transfer 
more of what has been achieved here to other provers, unless the command-line 
really turns out as part of the very essence of interactive theorem proving. 



A OCaml sources of the YXML library 

A.l YXML transfer syntax 

(* 

Efficient text representation of XML trees using extra characters X 
and Y — no escaping, may nest marked text verbatim. Suitable for 
direct inlining into plain text . 

Markup <elem att="val" ...>.. .body ... </elem> is encoded as: 

X Y name Y att=val ... X 

body 

X Y X 
*) 

module type YXML = 
sig 

val char_X : char 

val char_Y : char 

val no_output : string * string 

val output_markup: string * XML. attributes -> string * string 

val string_of _body : XML. body -> string 

val string_of : XML. tree -> string 

val parse_body: string -> XML. body 

val parse: string -> XML. tree 
end 

module YXML: YXML = 
struct 

(* markers *) 

let char_X = '\005' 
let char_Y = '\006' 

let str_X = String. make 1 char_X 
let str_Y = String. make 1 char_Y 



let str_XY = str_X ' str_Y 
let str XYX = str XY " str X 



let detect s = String. contains s char_X I I String. contains s char_Y 



(* ML basics *) 

let (|>) X f = f X 

let (@>) f g X = g (f x) 

let rec fold f list y = 
match list with 

[] -> y 
I X : : xs -> fold f xs (f x y) 



(* output *) 

let implode = String. concat "" 

let content xs = implode (List. rev xs) 

let add x xs = if x = "" then xs else x : : xs 

let no_output = ("", "") 

let output_markup (name, atts) = 
if name = " " then no_output 
else 

(str_XY ~ name 

implode (List. map (fun (a, x) -> str_Y ~ a ~ "=" ~ x) atts) ~ str_X, 
str_XYX) 

let string_of _body body = 

let attrib (a, x) = add str_Y @> add a @> add "=" ®> add x in 
let rec tree = function 

I XML.Elem ((name, atts), ts) -> 

add str_XY @> add name @> fold attrib atts @> add str_X ®> 
trees ts ®> 
add str_XYX 
I XML. Text s -> add s 
and trees ts = fold tree ts 
in content (trees body [] ) 

let string_of tree = string_of _body [tree] 



(** parsing *) 

(* split *) 

let split fields sep str = 
let cons i n result = 



if i = && n = String. length str && n > then str : : result 

else if n > then String. sub str in:: result 

else if fields then "" :: result 

else result 
in 
let rec explode i result = 

let j = try String. index_from str i sep with Not_found -> -1 in 
if j >= then explode (j + 1) (cons i (j - i) result) 
else List. rev (cons i (String. length str - i) result) 
in explode [] 



(* parse *) 

let err msg = raise (Failure ("Malformed YXML: " ~ msg) ) 
let err_attribute () = err "bad attribute" 
let err_element () = err "bad element" 
let err_unbalanced name = 

if name = "" then err "unbalanced element" 

else err ("unbalanced element \"" ~ name ~ 'i\imi) 

let parse_attrib s = 
try 

let i = String. index s '=' in 

let _ = if i = then err_attribute () in 

let j = i + 1 in 

(String. sub s i, String. sub s j (String. length s - j)) 
with Not_found -> err_attribute () 

let parse_body source = 

(* stack operations *) 

let add x ((elem, body) :: pending) = (elem, x :: body) :: pending 
in 

let push name atts pending = 

if name = "" then err_element () 

else ((name, atts), [] ) :: pending 
in 

let pop (((name, _) as markup, body) :: pending) = 

if name = " " then err_unbalanced " " 

else add (XML. Elem (markup. List. rev body)) pending 
in 

(* parse chunks *) 

let chunks = split false char_X source |> List. map (split true char_Y) in 

let parse_chunk = function 
I [""; ""] -> pop 

I ("" :: name :: atts) -> push name (List. map parse_attrib atts) 
I txts -> fold (fun s -> add (XML. Text s)) txts 



m 

match fold parse_chunk chunks [(("", [] ) , [])] with 

I [(("", _) , result)] -> List. rev result 

I ((name, _) , _) :: _ -> err_unbalanced name 

let parse source = 

match parse_body source with 

I [result] -> result 

I [] -> XML. Text "" 

I _ -> err "multiple results" 

end 



A. 2 Untyped XML trees and typed representation of ML 
values 



module type XML_Data_Ops = 


sig 




type 'a a 


type 'a t 


type 'a v 


val 


int_atom: int a 


val 


bool_atom: bool a 


val 


unit_atom: unit a 


val 


properties: (string * string) lis 


val 


string: string t 


val 


int : int t 


val 


bool: bool t 


val 


unit : unit t 


val 


pair: 'a t -> 'b t -> ('a * 'b) t 


val 


triple: 'a t -> 'b t -> ' c t -> ( 


val 


list: 'a t -> 'a list t 


val 


option: 'at -> 'a option t 


val 


variant: 'a v list -> 'at 


end 





a * 'b * 'c) t 



module type XML = 
sig 

type attributes = (string * string) list 

type tree = Elem of ((string * attributes) * tree list) I Text of string 

type body = tree list 

exception XML_Atom of string 

exception XML_Body of tree list 

module Encode : XML_Data_Ops with 

type ' a a = ' a -> string and 

type ' a t = ' a -> body and 

type 'a V = 'a -> string list * body 
module Decode: XML_Data_Ops with 

type 'a a = string -> 'a and 

type 'a t = body -> 'a and 

type 'a v = string list * body -> 'a 
end 



module XML: XML = 
struct 

type attributes = (string * string) list 

type tree = Elem of ((string * attributes) * tree list) I Text of string 

type body = tree list 



let map_index f = 

let rec mapp i = function 
I [] -> [] 

I X : : xs -> f (i , x) : : mapp (i + 1) xs 
in mapp 

exception XML_Atom of string 
exception XML_Body of tree list 



module Encode = 
struct 

type 'a a = 'a -> string 

type 'a t = 'a -> body 

type 'a V = 'a -> string list * body 



(* atomic values *) 

let int_atom = string_of _int 

let bool_atom = function false -> "0" I true -> "1" 

let unit_atom () = "" 

(* structural nodes *) 

let node ts = Elem ((":", []), ts) 

let vector = map_index (fun (i, x) -> (int_atom i, x) ) 

let tagged (tag, (xs, ts)) = Elem ((int_atom tag, vector xs) , ts) 

(* representation of standard types *) 

let properties props = [Elem ((":", props), [])] 

let string = function "" -> [] Is -> [Text s] 

let int i = string (int_atom i) 

let bool b = string (bool_atom b) 
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let unit = string (unit_atom ()) 

let pair f g (x, y) = [node (f x) ; node (g y)] 

let triple f g h (x, y, z) = [node (f x) ; node (g y) ; node (h z)] 

let list f xs = List. map (fun x -> node (f x) ) xs 

let option f = function None -> [] I Some x -> [node (f x)] 

let variant fns x = 

let rec get_index i = function 

I [] -> raise (Failure "XML .Encode .variant") 

I f : : f s -> try (i, f x) with Match_f ailure _ -> get_index (i + 1) fs 
in [tagged (get_index fns)] 

end 



module Decode 
struct 



type 'a a = string -> 'a 

type ' a t = body -> ' a 

type 'a V = string list * body -> 'a 



(* atomic values *) 

let int_atom s = 

try int_of _string s 

with Invalid_argument _ -> raise (XML_Atom s) 

let bool_atom = function 
I "0" -> false 
I "1" -> true 
I s -> raise (XML_Atom s) 

let unit_atom s = 

if s = "" then () else raise (XML_Atom s) 



(* structural nodes *) 

let node = function 

I Elem ((":", [] ) , ts) -> ts 
I t -> raise (XML_Body [t] ) 

let vector = 

map_index (function (i, (a, x) ) -> 

if int_atom a = i then x else raise (XML_Atom a)) 

let tagged = function 

I Elem ((name, atts) , ts) -> (int_atom name, (vector atts, ts)) 
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I t -> raise (XML_Body [t] ) 



(* representation of standard types *) 

let properties = function 

I [El em ((":", props), [])] -> props 
I ts -> raise (XML_Body ts) 

let string = function 
I [] -> "" 
I [Text s] -> s 
I ts -> raise (XML_Body ts) 

let int ts = int_atom (string ts) 

let bool ts = bool_atom (string ts) 

let unit ts = unit_atom (string ts) 

let pair f g = function 

I [tl; t2] -> (f (node tl) , g (node t2)) 
I ts -> raise (XML_Body ts) 

let triple f g h = function 

I [tl; t2; t3] -> (f (node tl) , g (node t2) , h (node t3)) 
I ts -> raise (XML_Body ts) 

let list f = List. map (fun t -> f (node t)) 

let option f = function 
I [] -> None 

I [t] -> Some (f (node t)) 
I ts -> raise (XML_Body ts) 

let variant fs = function 
I [t] -> 

let (tag, (xs, ts)) = tagged t in 
let f = try List. nth fs tag 

with Invalid_argument _ -> raise (XML_Body [t] ) in 
f (xs, ts) 
I ts -> raise (XML_Body ts) 

end 

end 
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